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ABSTRACT 

A challenge for large-scale siRNA loss-of-function 
studies is the biological pleiotropy resulting from 
multiple modes of action of siRNA reagents. A ma- 
jor confounding feature of these reagents is the 
microRNA-like translational quelling resulting from 
short regions of oligonucleotide complementarity 
to many different messenger RNAs. We developed 
a computational approach, deconvolution analysis 
of RNAi screening data, for automated quantita- 
tion of off-target effects in RNAi screening data 
sets. Substantial reduction of off-target rates was 
experimentally validated in five distinct biological 
screens across different genome-wide siRNA li- 
braries. A public-access graphical-user-interface has 
been constructed to facilitate application of this al- 
gorithm. 

INTRODUCTION 

Genome-wide high-throughput RNA interference (RNAi) 
screening has been widely applied in biomedical research 
for discovery of drug targets or illumination of unknown 
molecular machinery, and has proven to be an effective 
means for functional annotation of protein-coding genes in 
both normal and disease contexts (1-5). However, a press- 
ing challenge for these studies is maximizing the return of 
accurate gene-level information with a technique that is as- 
sociated with pleiotropic mechanisms of action. For ex- 
ample, multiple studies indicate that individual small in- 
terfering RNAs (siRNAs) often interfere with the expres- 
sion of hundreds of genes through partial sequence comple- 



mentarity that imitates microRNA (miRNA) activity (6,7). 
Therefore, the phenotypic read-outs from siRNA screens 
are usually comprised of both the desired 'on-target' effects 
of intended target gene depletion together with uninten- 
tional 'off-target' effects that are oligonucleotide sequence 
dependent, but target gene-independent. The latter can lead 
to many false positive 'hits' that subsequently obscure in- 
terpretation of the overarching screen results. Time- and 
resource-intensive experimental approaches for target val- 
idation therefore often define the limits of the reliable gene- 
level information from any given screen. Computational ap- 
proaches have been designed which can help identify off- 
targeted transcripts within a given screening effort, and 
therefore lead to the discovery of new genes or pathways as- 
sociated with the phenotype under investigation (8,9). How- 
ever, directly addressing high false positive rates and decon- 
volution of off-target phenomena is still a major bottleneck 
restraining the pace of discovery for functional genomics ef- 
forts. To address this issue, we developed a computational 
approach, Deconvolution Analysis of RNAi screening data 
(DecoRNAi), for automated quantitation of off-target ef- 
fects in RNAi screening data sets. 

MATERIALS AND METHODS 

Data processing 

DecoRNAi approach has been tested in five distinct biolog- 
ical screens across different genome-wide siRNA libraries, 
and all data processing and Z score derivations were con- 
sistent with the original publications (1-5). 

(1) For the HI 155 toxicity screens (1), host modulators of 
HlNl-cytopathogenicity (3) and the HCC4017 toxic- 
ity screens (5), raw cell viability data were transformed 
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to robust Z score (formula shown below) and adjusted 
for batch effect. That is, raw data were grouped by ex- 
perimental batch and within each group, sample me- 
dian and median absolute deviation were used to calcu- 
late robust Z score. Annotation of all siRNA/miRNAs 
pools and their associated Z scores can be found in Sup- 
plementary Tables SI, S2, S3 and S6. 

^ cell viability — sample median 
median absolute deviation (MAD) 

MAD = median, (\Xi — sample median]) 

(2) For the WNT (int/ Wingless) pathway siRNA screen 
(4), Z scores were calculated as a standard score cen- 
tered on the population mean of each screening run as 
described by the average of each triplicate experiment 
minus the standard deviation. Annotation of all siRNA 
pools and their associated Z scores can be found in Sup- 
plementary Table S4. 

(3) For the selective autophagy siRNA screen (2), mi- 
tochondrial mass for each cell was approximated by 
the following formula: mitochondrial mass ~ ySo + y6i 
Parkin + siRNA + ^3 Parkin x siRNA. Two-way 
ANOVA models were used to identify siRNAs that de- 
creased Parkin-mediated mitophagy and Z scores were 
calculated as the statistical significance. Annotation of 
all siRNA pools and their associated Z scores can be 
found in Supplementary Table S5. 

DecoRNAi analysis 

The LASSO (least absolute shrinkage and selection op- 
erator) regression approach was adapted to quantify the 
strength of seed-link effects. For this analysis, each Z score 
is modeled as a linear combination of on-target effect 
(demonstrated in the Supplementary Figure S5) and seed 
sequence based off-target effects. The LASSO regression 
model was defined as below: 

Z= ZyS + 7, subject to \p\ < s 

where Z, is the rth original Z score, Pj is the estimated off- 
target effect of the yth seed family, 7, is the corrected Z score 
(on-target effect) and A is the penalty parameter. X is de- 
noted as below: 

X^\x■^ x: = I ^ ' ' ^ 

L' '^J ' " 0, otherwise 

And the solution is given: 

^^argminTlIZ- Jr;6||2 + l|^|l 

For each seed family, we can thus estimate the coefficient 
that indicates the strength and direction of predicted off- 
target effects. A negative coefficient means the seed family 
tends to lower Z scores and vice versa. Based on empiri- 
cal experience, X is set to 0.001 as the default. We annotate 
those coefficients with absolute value >7 as indicating can- 
didate off-target effects for all four datasets shown in this 
manuscript. However, all the parameters and cutoff values 
are tunable by users. 



For LASSO-selected off-target seed families, we further 
examine the statistical significance using the Kolmogorov- 
Smirnov test (KS-test). Taking Z as a vector of original Z 
scores from primary screening, the empirical distribution 
function F„ for Z scores from seed family S is defined as: 

where /z,<_- is the indicator function, equal to 1 if Z) < z 
and equal to 0 otherwise, Ns is the total number of Z scores 
from seed family S. The KS statistic for a given cumulative 
distribution function F(z) is as: 

Z)„, = sup\F„Xz)- Fiz)\ 

The statistical significance (P- value) was then determined 
by the KS statistic. 

Web-based application (Galaxy) 

The DecoRNAi application is available at http://galaxy. 
qbrc.org/root?tool_id=sirna_offtarget, which is an open 
web-based interface. Analysis parameters can be specified 
by users as below: 

• InputFile: CSV File containing response variable and 
siRNA sequence data. 

• Strand: Specify the strand orientation for analysis. 

• Lambda: Penalty parameter used in the model. 

• Seed: Range: 1-14. Specify the seed region to be used. 

• Library: Specify siRNA library. Default is custom which 
requires user input sequences. 

• Strength: Specify the cutoff for strength of seed-linked 
effect. Must be positive value. 

• Significance: Specify the cutoff for significance (/"-value). 

Tissue culture, oligo transfection and cell viability assays 

In primary screening, all projects employed pooled siRNAs 
targeting strategy. The sequences of pooled siRNAs are dif- 
ferent. However, by design, the siRNAs within the same 
pool should be targeting the same gene by perfectly match- 
ing on different location of the corresponding messenger 
RNAs (mRNAs) (Supplementary Figure S5). 

In secondary individual oligo screening, H1155 cells 
were grown in RPMI 1640 (Gibco®) supplemented with 
5% fetal bovine serum (FBS; Atlanta Biologicals) and 1% 
penicillin/streptomycin (Gibco®). All siRNAs were pur- 
chased from Dharmacon. The library contains 24 sets of 
four siRNAs each. The oligos targeting transmembrane 
protein 114 (TMEM114) from Dharmacon were used for 
siRNA negative control. The miR 4633-5p and the synthetic 
miRNA were from Ambion. Nontargeting miRNA con- 
trol (IN-00 1005-0 1-05) was from Dharmacon. For reverse 
transfection, 1 jjlI siRNA (lOuM) in 30 |xl serum free media 
(SFM) was mixed with 0.4 |jl1 RNAi Max (Invitrogen) in 10 
|jl1 SFM. 40 jjlI siRNA-reagent mix per well and 5000 cells 
per well, from a single cell suspension, were delivered in 100 
|jl1 media in 96-well microtiter plates. Cell viability was mea- 
sured 96 h post-transfection with CellTiter-Glo (Promega) 
according to manufacturer's specifications. 
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Figure 1. Deconvolution analysis of RNAi screening data (DecoRNAi). (a) WorMow of DecoRNAi analysis, (b) Original phenotypic measurements (Z 
scores, etc.) are projected on both on-target effect (green) and off-target effect (red) in a deconvolution pattern (top panel). A penalized linear model is 
used to quantify on-target and off-target effects. The resulting seed family scores from a genome-wide HI 155 siRNA toxicity screen are shown (bottom 
panel). Each dot represents a seed family including all siRNAs sharing the same seed sequence. X-axis is estimated strength of seed-linked effect (off-target 
effect) and Y-axis is P value associated with each seed family. 



RESULTS 

Here we have designed a data-driven computational ap- 
proach, DecoRNAi, to quantify and correct miRNA-mimic 
off-target effects from whole-genome RNAi screens (Fig- 
ure la). DecoRNAi simultaneously estimates gene-level on- 
target effects and siRNA oligonucleotide-level off-targets 
effects based on deconvolution of phenotypic measure- 
ments from primary screening data (Figure lb). Experimen- 
tal evaluation, using thousands of oligonucleotide retests 
across five independent whole -genome siRNA screens, indi- 
cated miRNA mimicry by siRNA oligonucleotides is a per- 
vasive source of 'off-target' biological responses to siRNA. 
Application of DecoRNAi significantly enhanced the fi- 
delity of single gene-level observations at whole-genome 
scale, and is provided here an open-source tool to en- 
hance lead discovery accuracy from high-throughput RNAi 
screening studies. 

A major determinant of the mRNA targets for trans- 
lational suppression by a given miRNA, is partial 
mRNA/miRNA complementarity corresponding to a 
6-nucleotide 'seed sequence' on the 5' end of the miRNA 
(10). By this definition, there is a sum total of 4^ (4096) 
possible non redundant 'seed sequences' within any siRNA 
or small hairpin RNA (shRNA) library collection. As 



there are usually tens to hundreds of thousands of oligos 
in a given screening collection, the presence of a given 
seed sequence within many different siRNA/shRNA 
reagents presents the opportunity to identify 'seed-driven' 
phenotypic associations among reagents within a given 
screen. To begin to test this, we examined a whole-genome 
siRNA toxicity screen in H1155 non-small cell lung cancer 
cells (3) (Supplementary Table SI) as a benchmark for 
identification of a reasonable scoring approach. This is a 
cell-based high-throughput RNAi screen to identify genes 
required for lung cancer cell viability. This screen employed 
an arrayed one-gene/one-well commercial siRNA library 
with pools of four independent siRNA oligonucleotide 
duplexes per gene. Seed sequence membership for each of 
the 168 992 oligonucleotides in the library was separately 
defined for each of the 6-mer windows present within each 
19-mer (Supplementary Figure SI a). The toxicity value 
for each siRNA pool was calculated as a Z score from 
a triplicate analysis in order to control for position and 
batch effects (11). We employed a two-sample KS-test as a 
nonparametric method to discover oligonucleotide-specific 
effects based on the coherent behavior of groups of siRNA 
pools with oligos sharing a common seed sequence (Sup- 
plementary Figure Sib). Using a confidence threshold of a 
false discovery rate (FDR) of 5%, seed family/phenotype 
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Figure 2. Experimental evaluation of seed-effect predictions, (a) Density 
distributions of consequences on cell viability (X-axis) of siRNA duplexes 
containing predicted off-target seed sequences (red curve) versus siRNA 
duplexes targeting the same genes but without predicted off-target seed se- 
quences (green curve), (b) Cell viability in response to individual siRNA 
oligo duplexes from 24 gene pools tested in HI 155 cells. Red dots indicate 
duplexes with an oligonucleotide containing the candidate off-target seed 
sequence. Green dots represent siRNA duplexes, designed to target the 
same gene, which do not contain the candidate off-target seed sequence, (c) 
An oligonucleotide mimic of hsa-miR-4633 significantly inhibited HI 155 
cell viability. Hsa-miR-4633 contains the seed sequence UAUGCC that 
was identified as an off-target seed through DecoRNAi analysis, (d) A 
synthetic miRNA mimic designed using the predicted off-target seed GU- 
UCCG significantly inhibited HI 1 55 cell viability, while a control miRNA 
mimic of identical sequence with the exception of the indicated seed region 
did not. 



associations were detected only within seed families defined 
by positions 1-6, 2-7 and 3-8 (Supplementary Figure 
Sic). The majority corresponded to a seed definition of 
nucleotides 2-7, in keeping with current best estimations of 
dominant determinants of miRNA target specificity (10). 
Using that definition for seed family membership, there are 
4001 unique seed sequences represented among the 168 992 
Dharmacon oligonucleotides, with a median frequency of 
representation of 27 siRNA pools (Supplementary Figure 
Sid). A Hability of the KS-test is sensitivity to family size. 
This can result in the discovery of associations with very 
low /"-values but very small effect sizes (Supplementary 
Figure Sle-g). To defend against that, we developed 
DecoRNAi algorithm that estimated strength and direc- 
tion of seed-driven effects using LASSO as a penalized 
regression model (see 'Materials and Methods' section). 

Application of DecoRNAi to the HI 155 toxicity screen 
identified 13 potential seed -driven phenotypic associations 
corresponding to 365 siRNA pools (Supplementary Fig- 
ure S2a). For experimental evaluation of candidate seed- 
driven effects, we chose four 'off-target' seed families (GU- 
UCCG, UCCAGG, UUGCAG, UAUGCC) for a focused 
secondary screen (Supplementary Figure S2c-f). Twenty- 
four genes were selected and for each gene, the correspond- 
ing Dharmacon siRNA pool was retested as four individ- 
ual siRNAs for consequences on HI 155 cell viability. Of 



note, siRNA duplexes containing an oligo with the detected 
off-target seed sequence were consistently associated with 
stronger consequences on cell viability than the remaining 
siRNA duplexes designed to target the same gene (Figure 
2a, b. Supplementary Figure S2g). Consistent with this, the 
cumulative density function indicates a significant shift in 
the Z scores associated with predicted seed-driven off-target 
effects (Supplementary Figure S2b, P — 4.56 x 10"'^). 

Of the detected seed families, only one (UAUGCC) is 
present within the seed sequence of an annotated human 
miRNA (hsa-miR-4633-5p). As expected, introduction of 
the miRNA mimic corresponding to miR-4633-5p resulted 
in a similar viability defect (Figure 2c). To further test the 
sufficiency of seed sequence identity for induction of off- 
target phenotypes, we engineered a synthetic miRNA cor- 
responding to the validated seed family GUUCCG (Sup- 
plemental Figure S2c, g). This reagent also effectively in- 
hibited viability of HI 155 cells, consistent with a dominant 
seed-sequence dependent mode of action (Figure 2d). 

We next applied DecoRNAi to four additional whole- 
genome siRNA screens employing distinct biological con- 
texts and endpoint assays. These included a siRNA and 
miRNA mimic screen for host modulators of HlNl- 
cytopathogenicity, a siRNA screen for modulators of WNT 
reporter gene activation, a siRNA image-based screen for 
selective autophagy factors and one additional screen for 
lung cancer drug target discovery using a distinct whole- 
genome siRNA library. 

The HlNl-cytopathogenicity screen sought to return 
genes that modulate influenza virus replication in human 
bronchial epitheHal cells (3) (Supplementary Tables S2 
and S3). For the primary screen, synthetic interactions 
of siRNAs and miRNA mimics with HlNl-induced cy- 
topathogenicity were measured using cell viability as the 
endpoint assay. From the siRNA screen, DecoRNAi identi- 
fied 13 significant seed families corresponding to eight syn- 
thetic lethal associations (353 siRNA pools) and five syn- 
thetic viable associations (96 siRNA pools) (Supplemen- 
tary Figure S3a). One of eight synthetic lethal associations 
corresponded to a human miRNA; hsa-miR-491. Of note, 
the hsa-miR-491 mimic was also the best scoring synthetic 
lethal reagent from the miRNA mimic screen (Figure 3a). 

The WNT screen sought to return genes modulating 
WNT pathway activation, and thus employed a highly spe- 
cific endpoint assay using a WNT-specific and a WNT- 
independent reporter gene combination (4) (Supplementary 
Table S4). Here, only one seed-sequence association was 
identified among reagents that selectively inhibited WNT 
reported activity (Supplementary Figure S3b). This may be 
indicative of the narrower biological space that can inter- 
sect the endpoint assay employed in this screening effort, 
and which is therefore less exposed to perturbation by mul- 
tiple seed-family oligonucleotides. 

The autophagy screen is an image-based screen, at sin- 
gle cell resolution, that sought to identify gene products re- 
quired for virophagy by measuring colocalization of Sin- 
bis virus capsid protein with autophagolysosomes (2) (Sup- 
plementary Table S5). DecoRNAi detected six significant 
seed-sequence associations with inhibition of selective au- 
tophagy corresponding to 125 siRNA pools (Supplemen- 
tary Figure S3c). From 28 individual siRNA oligo retests, 
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Figure 3. Application ofDecoRNAi to additional RNAi screens with distinct biological contexts, (a) Identification and estimation of seed-sequence depen- 
dent off-target effects from a genome-wide siRNA screen for modulators of HlNl induced cell death in HBEC30 cells. Consequences of 426 MicroRNA 
mimics on HBEC30 cell viability upon HlNl infection are shown. hsa-miR-491, which contains a predicted off-target seed sequence, is as indicated. Out 
of all miRNA mimics, hsa-miR-49 1 has the lowest cell viability. Annotation of all miRNA mimics and their associated viability scores can be found in 
Supplemental Table S3 . (b) Identification and estimation of seed-sequence dependent effects from a genome-wide siRNA screen for modulators of selective 
autophagy in HeLa cells. Phenotypic Z scores from secondary screens of individual siRNA duplexes with and without predicted off-target seed sequences 
are plotted as indicated, (c) DecoRNAi-mediated Z score corrections reduce false positive rates in autophagy modulator screen. Here, gene targets scoring 
positive with two or more confirmed siRNAs out of a total of four are considered to be true positives. The X-axis indicates arbitrary 'hit' thresholds based 
on rank-ordered Z scores and the Y-axis indicates the corresponding false positive rate. For example, the false positive rate for the top 20 'hits' rank ordered 
by Z score is 24"/u when using the primary Z score and 17% when using the corrected Z score, (d) Identification and estimation of seed-sequence dependent 
off-target effect from a genome- wide siRNA toxicity screen on HCC4017 using an alternative distinct siRNA library. Individual oligos from three genes 
in HCC4017 screen data were tested as indicated. Viability phenotypes were uncoupled with from target gene knock-down. Individual oligos from three 
genes in HCC4017 screen data were tested. Target genes were knocked down by all siRNAs, however, only oligos with off-target seeds reduced cell viability. 



those belonging to the detected off-target seed families 
trended towards lower Z scores than those designed to tar- 
get same genes but not belonging to the predicted off-target 
seed families (Figure 3b). Importantly, substitution of the 
original Z scores with DecoRNAi -corrected Z scores, which 
remove the estimated miRNA-like off-target effects from 
the overall phenotypic measures ('Materials and Meth- 
ods' section), significantly reduced the experimentally de- 
termined false positive rate (Figure 3c). 

To evaluate the performance of DecoRNAi using dif- 
ferent genome-wide siRNA libraries, we examined addi- 
tional toxicity screens designed to identify genes required 
for lung cancer cell viability (5). The non-small cell lung 
cancer cell line HCC4017 was screened for siRNA pools 
from the Ambion 'Silencer-Select' library that significantly 
impaired viability (Supplementary Table S6). DecoRNAi 



identified 10 off-target seeds from the library enriched 
in these screens (Supplementary Figure S3d). Testing of 
60 individual siRNA oligonucleotides consistently showed 
that siRNAs containing these 'off-target' seeds have dra- 
matic consequences on cell viabihty as compared to other 
siRNAs targeting the same genes (Figure 4a). For in- 
dividual siRNA duplexes targeting SIAT7D, BZW2 and 
DNAJA4 (Figure 4b), mRNA expression data showed that 
all reagents successfully depleted the corresponding target 
gene mRNA, however, only oligos containing the predicted 
off-target seed had significant viability phenotypes (Figure 
3d). miRNA and synthetic miRNA phenotypes were con- 
sistent with a dominant seed-sequence dependent mode of 
action (Figure 4b-d). 

To facilitate application of the DecoRNAi algorithm, we 
have released a public access web-based graphical user in- 
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Figure 4. Application of DecoRNAi to a distinct siRNA library, (a) Identification and estimation of seed-sequence dependent effects from a genome-wide 
siRNA toxicity screen in HCC4017 cells using an alternative siRNA library. Density distributions of consequences on cell viability (X-axis) of siRNA 
duplexes containing predicted off-target seed sequences (red curve) versus siRNA duplexes targeting the same genes but without predicted off-target seed 
sequences (green curve), (b) Identified off-target seed UCUGAC and ACAUGU were retested using individual oligos designed to target SIAT7D, BZW2 
and DNAJA4. miRNA mimic and synthetic miRNA were also retested. (c) An oligonucleotide mimic of hsa-miR-4256 sharing seed UCUGAC significantly 
inhibited HCC4017 cell viability, (d) A synthetic miRNA mimic designed using the predicted off-target seed ACAUGU significantly inhibited HCC4017 
cell viability, while a control miRNA mimic of identical sequence with the exception of the indicated seed region did not. 



terface http://galaxy.qbrc.org/root7tool jd=sima_offtarget 
for custom analysis (Figure 5). This tool contains pre- 
computed seed sequence families for three commonly em- 
ployed commercial siRNA libraries. For custom collec- 
tions, the tool will compute seed sequence membership 
from a user-supplied reagent sequence table. The default 
parameters were provided for the DecoRNAi online tools 
based on the empirical performance, but all the parame- 
ters were tunable by users. The output files include global 
data visualization, the identified seed family associations, 
the siRNA pools containing off-target seed families, cor- 
rected Z scores and the potential miRNAs with pheno- 
types of interest (Figure 5, Supplementary User's Man- 
ual). To facilitate local software installation, we also de- 



veloped an R package for user to perform custom analy- 
sis available on our galaxy webpage http://galaxy.qbrc.org/ 
root?tool_id=sirna_offtarget. 

DISCUSSION 

We constructed DecoRNAi to quantify seed-driven off- 
target activity by modeling the enrichment of oligonu- 
cleotide sequence-specific effects from genome-wide RNAi 
primary screen data. The approach does not require ar- 
bitrary phenotypic threshold selection, and combines sta- 
tistical significance of population separation with pheno- 
typic effect size to return biologically meaningful corre- 
lations. We have found that the algorithm performs well 
across diverse phenotypic assays and within distinct reagent 
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Family 

euucce 

ACUAGU 
GUGUAC 
UAGGAG 
UAUGCC 



Strength 

-3.78 
-1.04 
-2.02 
-1.05 
-1.21 



Family 
Size 

15 
31 
7 

43 
14 



P Value 

4.72E-05 
0.00116 
0.003884 
0.000999 
0.00047 



Output figure 
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-5 0 5 

Strength of seed-linked effect 



MIRBase ID 
Number 

miR-3923 
miR-3177-5p 

miR-4266 
miR-4633-5p 



Mature Sequence 

AACUA6UAAUGUUGGAUUAGGG 
U6U6UACACACGUGCCAGGCGCU 
CUAG6AGGCCUUGGCC 
AUAUGCCUGGCUAGCUCCUC 



sIRNA pools containing GUUCCG 



Identified off-target seed families 



Annotated miRNAs with phenotypic effect 



Figure 5. Illustrations of the web-based graphical user interface for DecoRNAi analysis. Seed families are pre-computed for the Dharmacon Library circa 
2005, Dharmacon Library circa 2009 and Ambion Silencer Select. For screens employing these reagents, the only required input is the quantitative screen 
measurement for each reagent (for example, normalized Z score). Other libraries can be analyzed upon uploading the library-wide sequence information 
for each oligonucleotide or processed shRNA, Parameter settings are user-selected. The output files include global visualization of seed family behavior, 
the predicted off-target seed families, the siRNA pools containing off-target seed families, the potential miRNAs sharing common seeds with identified 
off-target seed families and the corrected Z scores (not shown here). 



collections. As expected, miRNA-like behavior of siRNA 
oligonucleotides was a pervasive feature associated with pri- 
mary screening phenotypes. This was detectable by DecoR- 
NAi, experimentally verifiable, and could be imitated with 
appropriately designed synthetic miRNA-like molecules. 

GESS (genome -wide enrichment of seed sequence 
matches) is recently reported computational tool designed 
to identify off-targeted transcripts rather than to isolate 
and correct off-target phenotypes (8). We applied the GESS 
algorithm in an attempt to employ it for the latter. However, 
this method identified no off-target siRNA pools from 
either the H1155 toxicity screen or the selective autophagy 
screen. In stark contrast, this approach identified 23 807 
off-target siRNA pools from the HlNl cytopathogencity 
screen (Figure 6a-c). However, we anticipate that GESS's 
intended utility will dovetail with DecoRNAi, providing 
a mechanism to help identify gene cohorts that are re- 
sponding to siRNAs responsible for seed-sequence driven 
phenotypes. 

Two additional computational efforts designed to de- 
flect spurious gene -level annotations from large-scale RNAi 
screens are ATARiS (Analytic Technique for Assessment of 



RNAi by Similarity) (12) and CSA (Common Seed Analy- 
sis) (13). ATARiS was developed to detect coherent behav- 
ior from multiple shRNAs targeting the same gene. While 
effective, the method is less generalizable outside of pooled 
shRNA screens and requires multi-sample RNAi screens (at 
least 10 samples in their publication). CSA, like DecoRNAi, 
detects correlated biological behavior of siRNAs that share 
the same seed sequence. However, CSA does not account for 
family-size bias with its statistical significance metric. Inte- 
gration of statistical significance with the strength and di- 
rection of biological phenotypes is likely an important con- 
sideration for optimized detection of false positives (Sup- 
plementary Figure S4c-e). Furthermore, DecoRNAi quan- 
tifies seed-driven ofF-target effects by modeling the on- 
target effects and off-targets from all individual siRNA du- 
plexes in the same gene pool, which is more efficient than 
looking at individual siRNA seed families separately. In 
support of these considerations, we found that the DecoR- 
NAi corrected Z scores had a significantly better true posi- 
tive rate than CSA corrections (Figure 6d). 

A limitation of DecoRNAi is appropriate representa- 
tion of seed families within a given screening collection 
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Figure 6. Comparison with GESS and CSA analysis. GESS analysis of human mRNA 3' UTRs from primary data of the HI 155 toxicity screen (a), the 
selective autophagy screen (b), and the HlNl cytopathogencity screen (c). Each point represents one 3' UTR and represents SMFa value plotted against 
the SMFi value, (d) DecoRNAi-mediated Z score corrections reduce false positive rates compared to CSA approach from selective autophagy screen. Here, 
gene targets scoring positive with two or more confirmed siRNAs out of a total of four are considered to be true positives. The X-axis indicates arbitrary 
'hit' thresholds based on rank-ordered Z scores after applying DecoRNAi approach or CSA approach, and the Y-axis indicates the corresponding false 
positive rate. 



to reach sufficient statistical power for detection of phe- 
notypic associations. However, from the cumulative anal- 
ysis of five different whole genome siRNA screens, we esti- 
mate that the DecoRNAi approach will cover ~85% of the 
seed sequence families present in a typical commercial ar- 
rayed siRNA library (Supplementary Figure S4a and b). To 
facilitate automated application of DecoRNAi to siRNA 
and shRNA library screening efforts, we have embedded 
pre-computed seed family annotations for three commonly 
used commercial RN Ai libraries (Dharmacon Library circa 
2005, Dharmacon Library circa 2009 and Ambion Si- 
lencer). In addition, we proved a tool for automated gen- 
eration of seed family annotation of user-specific siRNA or 
shRNA oligonucleotide collections (http://galaxy.qbrc.org/ 
root?toolJd=sirna_ofFtarget). The tool is based on Galaxy 
open source framework, accepts phenotypic measures (such 
as Z scores) from primary screens as input, and allows iter- 



ative parameter choices for data analyses. All of the user- 
specified parameters are documented in detail, and the in- 
termediate outputs are provided for transparent analysis 
(Figure 5, Supplementary User's Manual). 

In summary, DecoRNAi is a computational tool that fills 
an important unmet need for the functional genomics re- 
search community as it enhances the return of rigorous bio- 
logically meaningful observations downstream of screening 
efforts that otherwise consume huge time and reagent re- 
sources by following bad leads, or by weeding them out us- 
ing strictly empirical approaches. Substantial reduction of 
off-target rates was experimentally validated in five distinct 
biological screens across different genome-wide siRNA li- 
braries. A public-access graphical user interface has been 
constructed to facilitate application of this algorithm by any 
investigator. 
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Supplementary Data are available at NAR Online. 
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